The moments after an earthquake are critical. Early detection and early warning of earthquakes can save property and lives. The United States Geological Survey (USGS) developed ShakeMap to assist public and private organizations and individuals with post-earthquake response and recovery. The USGS distributes ShakeMaps in the minutes after an earthquake. These maps and their associated datum show the shaking intensity across the affected area. These maps are updated over time as additional information flows in (Wald 2006).
With the help of scientists, engineers and public officials and in large part due to improved early warning systems and recover tools like ShakeMap damages and losses associated with earthquakes have declined. ShakeMap distribution comes minutes after an earthquake but is there a system or platform that can detect and send out early warnings within seconds? This report looks at twitter data to see if it could be the answer.
Over 330 million active users use twitter each per month (Twitter 2017). This micro-blogging tool has become a communication platform for millions of people around the world. The data from twitter is publicly available through the twitter API. In addition to the text of a tweet, if a user chooses to include their geolocation information that information is available as well as time at which the tweet was sent. By looking at the volume of tweets, the location from which a tweet is sent this report examines whether data can be used to create intensity shaking maps.
This report focuses on the South Napa, California earthquake that occurred on August 24, 2014. This 6M earthquake struck at 4:40 MT 30 miles north northeast of San Francisco, just outside of Napa. ShakeMap reported shaking intensity as high as 8 close to the epicenter. We will look at the tweets sent immediately after the earthquake and within 70 miles of the epicenter for our study.
Figure 1. Epicenter of the August 24, 2014 South Napa earthquake. The blue box shows the boundaries for the area of interest for this report.
Over the past decade, several researchers have examined how twitter data can be used in earthquake analysis. These researchers cite the volume of accessible, real-time, geo-located data as the primary reasons for looking at twitter as a data source. While researching the 2011 Mineral, Virginia earthquake, Crooks, Croitoru, Stefanidis and Raszikowski looked at over 20,000 geolocated tweets collected within 8 hours of the earthquake while Kropivinitskaya, Tiampo, Qin and Bauer worked with over 1.8 gigabytes data collected within 24 hours of the 2014, South Napa earthquake. Several articles examine whether an intensity map like ShakeMap can be created solely on Twitter data. Kropivinitskaya et al. found that the logarithmic number tweets can be used as a “proxy” for shaking intensity. Using data from three earthquakes in Japan, Burks, Mill and Zadeh combined quantitative characteristics of twitter clusters (ex. count of tweets and average number of characters) with “earthquake-based” features (ex. moment magnitude and community decimal intensity) to create multiple regression models. They found the models that best estimated shaking intensities were those that included both twitter and “earthquake-based” data. Crooks et. al. looked at rates of tweets by their distance from the epicenter and concluded that while the twitter data could not be used to replicate a large scale shaking intensity map, it could be used in recovery efforts since the twitter data was “dense at the right places,” places where there are people. The earliest study in this review was done in 2011 by Earle, Bowden and Guy. They examined global frequencies of earthquake related tweets and earthquake events. They determined that twitter data was not good for global earthquake detection, but could be used for warnings across densely populated areas. The conclusion from these studies is that twitter can supplement and enhance seismic sensor data and field reports but its best use is not in estimating shaking but in alerting first responders to those areas in most need of help and in the dissemination of information.
All the studies noted several limitations with twitter data including the restrictions on the amount of data one can query from the twitter API and the fact that less than 1% of all tweets was geolocated. However, these issues were not seen as insurmountable due to the significant amount of data that is still available.
After concluding that twitter data is most valuable when used in conjunction with sensor data, each of the studies suggested that twitter could be used in areas that cannot support a network of sensors a seeminglly “something is better than nothing”" approach.
The study area lies within the northern portion of the San Francisco Bay Area. The earthquake took place in the West Napa Fault zone. The epicenter [Lat. 38.22N, Long. -122.31W] was located 6 miles south southwest of Napa near the north shore of San Pablo Bay. The Bayshore areas in the San Francisco Bay region are underlain by landfill and bay mud (Johnson and Mahin 2016).
ShakeMap is a product of the USGS Earthquake Hazards Program. Using ground-motion observations from seismic stations, observations from Did You Feel It? and field surveys, ShakeMap maps display the ground motion and shaking intensity using the Modified Mercalli Intensity scale (Wald 2006). Due to its size and location the South Napa earthquake has been the subject of many recent studies. To support these studies, the USGS provides ShakeMap data for the South Napa earthquake in multiple formats. For our report, we used raster files and shapefiles for MMI information and a comma delimited text file for sensor location information. The raster and shapefiles required little manual manipulation. We updated the projection of the raster file to match the projection of our other data. We also changed data types of several factors. The raster and shapefiles were opened and plotted using pre-built functions from R packages. The station location .csv file required minimal manual manipulation. For consistency, the time field in the .csv files was converted to mountain time using the POSIXct function.
The twitter data comes from CU Boulder’s Project EPIC. From EPIC, we received a csv file containing tweets from August 2012 thru February 2017. These tweets were obtained in real-time using the twitter API and a query for all tweets containing the word “earthquake” or similar. The tweets we received had been partially processed. They all had geolocation data and no re-tweets were included.
Once we received the data we manipulated the TimeStamp and location fields for sub-setting. Sub-setting on the TimeStamp factor we were able to create smaller dataframes. The polar coordinates were included in one field which was of a character datatype. Using a combination of string and character manipulation functions from R packages along with sub-setting we added numeric latitude and longitude columns to the twitter data dataframe. Using the epicenter and longitude and latitude of each tweet were able to calculate the distance from the epicenter. We also used the as.numeric function to calculate elapsed minutes, hours and days between the earthquake and each tweet.
We used tidytext for some text mining work (a histogram on word frequency and text based cluster analysis), but the results were not significant enough to include in the report.
Clean-up and organization of twitter data:
Looking at the data in decreasing time intervals we found the first ten minutes after the earthquake had the highest rate of tweets within our area of interest. The first tweet went out 21 seconds after the earthquake.
Figure 2. The number of tweets over three time periods is shown. The highest rate of tweets comes within the first day within the first hour within the first 10 minutes after the earthquake.
The majority of tweets came from within a 50 mile radius. 30 miles from the epicenter there is a noticible cluster of tweets. Not coincidentally, the densely populated city of San Francisco is located 30 miles south southwest of the epicenter
Figure 3. Tweet distance from the epicenter in the first 10 minutes after the earthquake. The majority of tweets came from withn a 75 mile radius (bottom left.) There is a cluster of tweets 30 miles from epicenter. The cluster is apparent throughout the 10 minutes (bottom right.)
When looking at the mapped twitter data, there are four large clusters of data with high numbers of tweets. The tweets are shown in relation to the shaking intensity reported by ShakeMap. Only one cluster of tweets (218 tweets) lies near the epicenter and in an area with a shaking intensity greater than 6. The largest cluster of tweets (1384 tweets) is located in San Franciscoe. The shaking intensity in San Francisco only got as high as 4. Similarly, a relatively high volume of tweets (405) came from San Jose. San Jose is just at the border of the shakemap. Parts of San Jose are not included in the ShakeMap due to the low to no level of shaking felt in San Jose.
Figure 4. Tweet volume in relation to ShakeMap shaking intensity. The heaviest shaking appears near the epicenter while the highest volume of tweets appears in San Francisco.
Our data did not show an exact correspondence between the number of tweets in an area and the shaking intensity of that area. While this was shown to be true in the Napa area, in densely populated areas the number of tweets reflected the population size and not the shaking intensity. That does not mean there is no value in looking at twitter data during and immediately after an earthquake. The first tweet went out less than 40 seconds after the earthquake. This quickly available data could be used in conjunction with ShakeMap data for a more accurate representation of the shaking closer to the time of the incident. In large scale hazards event, the early tweets could offer some warning to areas further from the epicenter. We were unable to perform any substantive text analytics. However, had we done this, we would expect to find that the content of the tweets could help first responders locate the areas where the hazard is negatively impacting the largest number of people. Though twitter cannot be used in isolation of other tools, both qualitative and quantitative twitter data can help in earthquake detection and recovery.
Burks, L.,M.Miller, and R. Zadeh (2014). Rapid estimate of ground shaking intensity by combining simple earthquake characteristics with tweets, Tenth U.S. National Conference on Earthquake Engineering Frontiers of Earthquake Engineering, Anchorage, Alaska, 21-25 July 2014.
Crooks, A., A. Croitoru, A. Stefanidis, and J. Radzikowski (2012). Earthquake: Twitter as a distributed sensor system, Trans. GIS 17, no. 1, 124-147.
Earle, P., D. Bowden, and M. Guy (2011). Twitter earthquake detection: Earthquake monitoring in a social world, Ann. Geophys. 54, no. 6, 708-715.
Johnson, L., Mahin, S. (2016). The Mw 6.0 South Napa Earthquake of August 24, 2014: A Wake-up Call for Renewed Investment in Seismic Resilience across California. Berkeley, California Seismic Safety Commission, Pacific Earthquake Engineering Research Center, no. 2016-04
Kropivnitskaya, Y., K. Tiampo, J. Qin, and M. Bauer (2017). The Predictive Relationship between intensity and tweets Rate for real-time ground-motion estimation, Seismological Research Letters 88, no. 3, 840-850, doi: 10.1785/0220160215
Kropivnitskaya, Y., K. Tiampo, J. Qin, and M. Bauer (2016). Real-time earthquake intensity estimation using streaming data analysis of social and physical sensors, Pure Appl. Geophys., 1-19, doi: 10.1007/ s00024-016-1417-6.
Project EPIC, CU Boulder, US National Science Foundation, Grants IIS-0546315 & IIS-0910586 (2017), Twitter data set 2012- 2017 [data set]. Boulder, Colorado: Project EPIC [distributor].
Twitter, 2017. Accessed May 9, 107. https://about.twitter.com/company
U.S. Geological Survey. (2017). ShakeMap [Maps and data]. https://doi.org/doi:10.5066/F7W957B2
Wald, David J. Shakemap Manual: Technical Manual, User’s Guide, and Software Guide. Reston, Va: U.S. Geological Survey, 2005. Internet resource.